Integration of Intonation in F0 Trajectory prediction using MSD-HMMs

نویسندگان

  • Xiaojun Zou
  • Xiao Bao
  • Lidong Luo
چکیده

Present study in speech synthesis places more and more emphasis on the spectral continuities and diverse prosodic effects. The trainable HMM-based speech synthesis method tends to generate more continuous spectral structures than the traditional unit selection method. However, the F0 trajectory generated by HMM-based speech synthesis is often excessively smoothed and lacks prosodic variance. This paper proposed an approach to improve the effect of F0 trajectory prediction in mandarin speech synthesis in the framework of multi-space probability distribution HMMs (MSD-HMMs). In the proposed approach, the intonation, which is predicted by context-dependent decision trees, is integrated to the F0 trajectory generated by the MSD-HMMs as a weighted bias term. The experiments indicate that it has an encouraging improvement in the prosodic effectiveness of Mandarin speech synthesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...

متن کامل

Generating natural F0 trajectory with additive trees

In HMM-based TTS, while the segmental quality of synthesized speech is quite acceptable, intonation, especially at the sentence level, tends to be somewhat bland. The maximum likelihood (ML) criterion used in HMM training and parameter trajectory generation is partially responsible for the blandness. Additionally, the F0 trajectory thus generated has a smaller dynamic range than that of natural...

متن کامل

Emotion conversion using F0 segment selection

This paper describes F0 segment selection, a novel syllablebased F0 conversion method, which provides a concatenative framework to search for F0 segments in a modest corpus of emotional speech (∼15 minutes of data). The method is compared with our earlier work on F0 generation using contextsensitive syllable HMMs. Both methods are complemented with a duration conversion module as well as GMM-ba...

متن کامل

Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure

Human uses expressive intonation to convey linguistic and paralinguistic meaning, especially making focal prominence to give emphasis that highlights the focus of speech. Automatic extraction of dynamic intonation feature from a speech corpus and representing it in a continuous form are desired in multilingual speech synthesis. This paper presents a method to extract dynamic prosodic structure ...

متن کامل

Automatic Intonation Event Detection Using Tilt Model for Croatian Speech Synthesis

Text-to-speech systems convert text into speech. Synthesized speech without prosody sounds unnatural and monotonous. In order to sound natural, prosodic elements have to be implemented. The generation of prosodic elements directly from text is a rather demanding task. Our final goals are building a complete prosodic model for Croatian and implementing it into our TTS system. In this work, we pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009